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Abstract. We study a random code ensemble with a hierarchical structure, which 
is closely related to the generalized random energy model with discrete energy values. 
Based on this correspondence, we analyze the hierarchical random code ensemble by 
using the replica method in two situations: lossy data compression and channel coding. 
For both the situations, the exponents of large deviation analysis characterizing the 
performance of the ensemble, the distortion rate of lossy data compression and the 
error exponent of channel coding in Gallager's formalism, are accessible by a generating 
function of the generalized random energy model. We discuss that the transitions of 
those exponents observed in the preceding work can be interpreted as phase transitions 
with respect to the replica number. We also show that the replica symmetry breaking 
plays an essential role in these transitions. 
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1. Introduction 

Signal processing is one of the main topics in information science and gaining much more 
significance in modern society. In this connection, statistical mechanical approaches to 
signal processing have been investigated for decades, which have provided various novel 
viewpoints to information theory [U [2] . 

Among various models in information theory, the random code ensemble is known 
as a fundamental model. This ensemble was introduced by Shannon [31 H] and found to 
show the optimal performance in error correction stated in the channel coding theorem 
investigated by himself. After the original study, Gallager [5] enforced its significance 
through the perfection of Shannon's result. In the context of statistical mechanics, 
this ensemble can be viewed as a fundamental spin-glass model: in certain limits, this 
corresponds to the random energy model (REM) proposed and rigorously analyzed by 
Derrida [61 [7]. This relation was first pointed out by Sourlas [8]. His work has been 
recognized as an epoch-making result followed by numerous works such as pi [TOj [TT] 
about decoding and [121 1131 EEU [OS US] about performance-achieving code. 

As a generalization of the REM, the model with a hierarchical structure, termed 
the generalized random energy model (GREM), was also proposed and rigorously solved 
in [TTJ HU [TH]. The original motivation of the generalization was to clarify the relation 
of the GREM with the other mean-field spin glass model such as the Sherrington- 
Kirkpatrick model [20]. In a recent work, Merhav [21] proposed a random code 
ensemble with a hierarchical structure for performance improvement and argued that 
such a hierarchical ensemble has a similar structure to the GREM. Based on such a 
similarity, he investigated two issues by large deviation analysis: distortion in lossy 
data compression and performance of the Bayesian decoder in channel coding through 
the binary symmetric channel (BSC). For lossy data compression, he concluded that 
for higher performance the parameters describing the hierarchical structure should be 
tuned to a range where the GREM shows the same thermodynamic behavior as the 
standard REM. He also discussed that the same tuning of hierarchical parameters for 
optimal performance holds in channel coding. However, for a decisive conclusion more 
detailed investigations are desired. As a crucial point, in taking the ensemble average for 
performance evaluation we need to consider quenched average, whereas in his analysis 
simpler annealed average was adopted although he gave some justifications. 

Under the circumstances, we reinvestigate the hierarchical random code ensemble 
in a more inclusive way by using the replica method, which enables us to evaluate the 
performance of the code with quenched average. In our recent work [22], we analyzed 
the GREM by the replica method and found that the multiple-step replica symmetry 
breaking (RSB) appears at low temperatures in the quenched limit. The quenched and 
the annealed limits are connected with each other in a region where a replica number 
is positive. This positive replica region becomes important for the large deviation 
analysis of the random code ensemble. We analyze this region in detail and see that 
the similar RSB transitions again appear. They play a crucial role for the transitions 
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of the distortion rate and Gallager's error exponent [21 [23] , which directly concerns the 
performance of the random code. 

The actual analysis is performed on a generalized discrete random energy model 
(GDREM). This model, where possible values of random energy are discrete unlike the 
original REM, can be seen as a generalization of the discrete REM in [211 1251 126] . We 
apply the replica analysis to the GDREM and obtain the phase diagram for the region 
of a non-negative replica number. The GDREM is directly mapped to the hierarchical 
random code ensemble. This mapping enables us to readily interpret the properties of 
the GDREM in the context of the random code. Phase transitions involving the higher 
step RSB found in the GDREM are directly connected to those in the distortion rate 
and Gallager's error exponent. We emphasize that the transitions in the region of a 
positive replica number are not merely theoretical matters in the replica analysis, but 
also have a practical significance in information theory. The physical interpretations of 
behaviors of the distortion rate and Gallager's error exponent constitute a part of main 
results in this paper. 

This paper is organized as follows. In section [21 we introduce the GDREM and 
analyze the phase diagram using the replica method. We show that many phases coexist 
on the diagram of temperature versus the replica number. In section [31 we briefly review 
the discussion of distortion in lossy data compression, and compare the result from our 
replica analysis of the GDREM with [21J. As shown there, the replica analysis enables 
us to investigate the distortion rate quite readily. The result indicates that the higher 
step RSB degrades the performance of a general hierarchical code. Error correction by 
the Bayesian decoder is studied in section [H where Gallager's error exponent is rederived 
from our result. We show that two-parameter optimization probably becomes significant 
when correlations between codewords exist. We also point out that the concentration of 
the Gibbs measure can be strongly related to the performance analysis of the Bayesian 
decoder. The last section is devoted to the conclusion. 

2. The GDREM 

In this section we introduce and analyze the GDREM. The REM 0, [7] is one of the 
fundamental models in spin glasses, and in its definition the energy of respective state 
is taken as random. Derrida and Gardner generalized the REM, termed the GREM, 
in their subsequent works [TTJ, [181 HH] by incorporating the hierarchical structure in 
the random energy. In the original work of the REM or the GREM, the probability 
distribution of the energy is Gaussian, whereas the GDREM dealt with here is the 
model of discrete random energy. In the following we study the GDREM with the 
binomial distribution of hierarchical random energy. 
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2.1. Random variable representation 

First we give the definition of the GDREM. We follow the notation for the GREM in 
our paper [22]. Prepare K hierarchical levels, and for the z/th level (1 < v < K) random 
variables e u (l), e u (2), . . . , e v {M v ) are assigned. These random variables {e u } become the 
energy components of the z/th hierarchy. The number of independent random variables 
for the uth level, can be factored as 

M v = (at- ■■a v ) N , (1) 

where is an integer satisfying 1 < < 2 N and denotes the number of independent 
random variables {e u } belonging to a state in the [y — l)st level (see figured]). For the 
deepest level v = K, Mk = (cti • • ■ ax) N = 2 N must be held. 

From the random variables, we introduce 2 N new variables {Ei}, which represent 
the energy of the system and are defined as 

K K 

^ = E ^ L(* - 1)M„/2"J + 1) = E e - ( 2 ) 

where % = 1, . . . , 2 N and \_x\ denotes the floor function indicating the largest integer not 
exceeding x. This structure is depicted in figure [Q 




Ei E2 E3 E4 . . . 

Figure 1. Schematic picture of hierarchical random energy. Here the case of 
K = 3,N = 4 and {af,a^af } = {4,2,2} is depicted. From the root to a leaf 
(corresponding to a state i) of the tree, we sum up e v {j), which becomes the energy of 
the ith state Ei. 



Then, the partition function is defined by 

2^ 2 N / K \ 

^(/3) = 5>-^ = £ ex P -0£e« , (3) 

i=l i=l \ u=l J 

with P = 1/T being the inverse temperature. 

The properties of this model are determined by the distribution of the random 
variables {e u }. Here we choose the binomial distribution to see the connection with 
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the random code ensemble. For the vth level, the distribution is characterized by a 
parameter L v . The specific form is 

^) = £„^(£f< £ ~<-^)> w 

where 8{x,y) is the Kronecker delta function. The number of possible values of e v is 
L v + 1, namely, e v can take any of —L u /2, —L u /2 + 1, . . . , L u /2 — 1, L u /2. For later 
convenience, we define the parameters a v = L u /N and a = Y^=i a v ^ n contrast to the 
case of the Gaussian REM, the value of the parameter a is significant. The RSB occurs 
only for a > 1 as discussed in [21], in which case we study in the following. 



2.2. Bit representation 

Here we give another definition of the GDREM by using bit variables to deal with the 
hierarchical random code ensemble. 

Prepare aN bits taking the value or 1, and divide them into K blocks as 
aN = Y2^=i L v - F° r the vth. block composed of L v bits, we randomly choose M v bit 
configurations from possible 2 Lv ones, denoted by z„(l), z„(2), . . . , z v (M v ) where each 
z v has L v components. The ith configuration x { is expressed by arraying the respective 
configuration of each block as 

x t = { Zl (l(i - l)M l /2 N \ +l),..., ZK ([(i- 1)M K /2 N \ + 1)} , (5) 

namely Xi is composed of aN elements. This procedure constructs 2 N bit configurations 
from 2 aN possible ones. The resultant set of chosen configurations, which is denoted as 
C hereafter, has a hierarchy with K levels as the random variable representation. 

After construction of states, we define the Hamming distance which counts the 
number of different bits between bit sequences Xi and y as 

K Ly 

dn( Xi ,y) = £ L(« - l)M v /2 N \ + l),y(0). (6) 

v=\ 1=1 

Here, Zu is the Zth component of the bit configuration z v , and y is a reference bit 
sequence which may be chosen as the simplest one such as the all- zero sequence 0. Using 
the Hamming distance, we define the energy of the zth state as Ei = dn(xt, 0) — aN/2, 
which leads to the partition function of this system as 

Z ^ = E ex p {-p ( d *&, °) - ^) } ■ ( 7 ) 

The ensemble of bit sequences given here is nothing but the hierarchical random code 
ensemble introduced in [21]. The energy has the same hierarchical structure as in ([2]) 
and the energy of each block is drawn from the binomial distribution in which 
means that the representation (j7|) is equivalent to (j3J). As we see later, this expression 
is convenient for the discussion of signal processing. 
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2.3. Replica analysis 

We analyze the GDREM by the replica method. As is well known, the replica method 
is a tool for taking ensemble average of logarithm or arbitrary power of the partition 
function. This method is thus quite suitable for the performance evaluation of the 
hierarchical random code ensemble, because the ensemble average of the arbitrary power 
of the partition function is totally desired, as we see in the following sections. The scheme 
is the same as demonstrated in |22j. The difference is only in the probability distribution 
function of energy. We briefly sketch the main result here. 

Let us evaluate replicated partition function Z n of the GDREM. When n is a 
natural number, Z n can be written as 



2 JV 

£• 


2 N 

•E ex p| 




h=i 


tn=l 


\ u=l 




2 JV 


( K Mv 


£■ 

h=i 


i„ = l 


-^££ 

\ v=l j=l 



zn ^) = » • » x p ( YM l} + 4 i2) + • • • + 4" )N 



_ _ , (8) 

i\=l i n = l \ v=l 3=1 / 

where 

n 
a=l 

and I is the indicator function 
The ensemble average yields 

2 N 2 N / K Ml, a f f W 

r] = y. ■ ■ ■ E «p (» E E «. i« c«h ^%M> 

U=l in=l \ f=l j = l 

= ^exp [g(K}) + ivf;f]a i ,lncosh ^^ {Za}) ') , (11) 

where [ ] means ensemble average and S({n u }) is the entropy function defined as the 
logarithm of the number of configurations giving {n u }. In deriving we should take 
care that the distribution of the energy is binomial, which is only the difference from 
our preceding work [22]. In the thermodynamic limit iV — > oo, we need to calculate the 
saddle-point contribution of [Z n ]. A generating function 4>((3,n) = \im N _+ 00 \n[Z n ]/N is 
convenient for this purpose, and is also significant for signal processing as seen later. 
In the rest of this section, we focus on calculating this generating function 0(/3,n). 
Hereafter we restrict ourselves to the cases of K = 1, 2 for simplicity. 

Practically, we need <f>(/3, n) for general n G R, even though the expression (11 ID 
is valid only for n e N. To bridge the gap, we utilize the replica method for analytic 
continuation from the natural to real number with the Parisi ansatz [271 128| 129] . For 
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readers not familiar with these procedures, we refer to [T] . Here we demonstrate a part 
of calculations for the case K = 2. 

According to the standard prescription using the Parisi ansatz, it is sufficient for 
the current case to consider the replica symmetric (RS) and the one-step RSB (1RSB) 
solutions in each hierarchy j22j EI]- If the 1RSB occurs in both the hierarchies with 
different block sizes, it can be interpreted as the two-step RSB (2RSB). Each solution 
can be graphically expressed by how n "balls" are partitioned into 2^ "boxes" (figure 
E]). 
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Figure 2. Graphical representation of the possible saddle-point solutions at K = 2. 
The horizontal and vertical axes represent the index of configurations and the number 
of "balls", respectively. All configurations are divided into a^-groups including - 
configurations. 



For each hierarchy, there are two RS solutions: the RS solutions of the first and 
the second sorts (RSI and RS2, respectively). For the RSI solution all n balls are 
distributed to different states in the hierarchy, while for the RS2 solution all n balls are 
in the same state in the hierarchy. For example, the RS2-RS1 solution corresponds to 
the solution being RS2 in the first hierarchy and RSI in the second one. The entropy 
of this solution is calculated as 

S({n u }) = ln{af a% - 1) • - • (a^ - (n - 1))} ~ iV(lnai + nlna 2 ), 

(12) 
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and the energetic term becomes 



K M v 
v=\ j=l 



In cosh 



0n u (j, {i a }) 



ai In cosh ^— + na2 In cosh — . 



In a 2 + a 2 In cosh — 



(13) 



(14) 



These yield the generating function <fi((3,n) as 

/3n 

4>((3, n) = In «i + ai In cosh — + n 

The other solutions are similarly evaluated; therefore, we skip the derivation. For 
the RSB solutions, there exist additional parameters (such as m and mi i2 ). These 
parameters are chosen to extremize 0(/3, n) and the explicit dependence on those 
parameters vanishes in the final step. The possible solutions are summarized as follows: 

' n (In 2 + a In cosh f ) (RSI - RSI) 
ln«i + ai In cosh 4p 

+n (In a 2 + a 2 In cosh § ) (RS2 - RSI) 

hi2 + alncosh^ " (RS2 - RS2) 

^tanhf +n(lna 2 + a 2 lncosh§) (1RSB - RSI) ' 

lnai + axlncosh^ + ^tanhf (RS2 - 1RSB) 



0CM 



Vf- tanh f 



^ (ai tanh f - + a 2 tanh f ) 



where the critical temperature (3 C is defined by the equation 



(1RSB - 
(2RSB) 



1RSB) 



R + In cosh — — — tanh ■ 
2 2 2 



0. 



(15) 



(16) 



with R = In 2/a. Other critical temperatures /3i and /9 2 are defined by the same equation 
( IT6]) with substitutions i? = lnai/ai = i?i and R = lna 2 /a 2 = i? 2 , respectively. 

Next, we choose the correct solutions from the above seven candidates of <p((3,n), 
which depend on the values of parameters. We first summarize the case K = 1 which is 
naturally included in the above result. For K — 1, the discrimination between the first 
and the second hierarchies is useless, which means that the correct solutions are chosen 
from the RS1-RS1, RS2-RS2 and 1RSB-1RSB solutions (hence abbreviated as RSI, RS2 
and 1RSB in the K = 1 case). When the solution of ( fTBI) exists, i.e. R < In 2 holds, 
we have three phases on the T-(3n plane as investigated in [23]. The phase diagram in 
this case is given in figure [3] (left). On the other hand, for the case In 2 < R, there is no 
phase transition and the RSI solution dominates the whole T-/3n plane, where there is 
no interest. 

In the case of K = 2, the interesting case is again R < In 2, i.e. /3 C has a finite 
value. Moreover, we should distinguish three cases depending on the values of /3 c ,i,2- 

First, for R 2 < R\, the GDREM shows the same behavior as the standard discrete 
REM, as discussed in [22]. Hence, further investigation is not necessary in this case. 

Second, for the case R\ < In 2 < R 2 , where (3 2 does not have a finite value, we 
have three phases: the RS2-RS1, RS1-RS1 and 1RSB-RS1 phases. In this case the 
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Tc T 2 TcTi 

Figure 3. Typical phase diagrams for K = 1 (left) and K — 2 with Inai/ai < In aijai 
and /?2 < oo (right). Td^ are defined as the inverse of /3 c ,i,2 respectively. 



second hierarchy is always in the RSI phase and only the first hierarchy shows phase 
transitions. In other words the system exhibits a similar phase structure as K — 1. We 
focus on the properties of the hierarchical system here, and therefore we skip this case. 

The last and the most interesting case is R\ < R2 < In 2. This condition means 
that all the critical temperatures have finite values and (3\ < (3 C < $2- In this case, 
there are six phases. The resultant phase diagram is depicted in figure [3] (right). 
To obtain this phase diagram, we basically determine the contributing phase by the 
maximization principle based on the saddle-point method. In addition, we need some 
mathematical and physical criteria such as the continuity of <p(j3,n) and the non- 
negativity of entropy [22]. For instance, let us return to K = 1 for simplicity. The 
boundary curve between the RSI and RS2 phases is obtained by equating 0(/3,n) for 
both the phases. The vertical phase boundary between the RSI and 1RSB phases is 
derived by considering entropy crisis, which is identified with spin-glass transition as 
widely known. The horizontal boundary between the RS2 and 1RSB phases should also 
exist as described in [23] : the RS2 phase cannot reach the quenched limit (3n = because 
it leads to unphysical behavior, e.g. lim n ^ <$>(&■> n)/n — > 00. Thus, the dominant phase 
should naturally shift to other phase in decreasing fin. These discussions can also be 
applied to K — 2, where the RSB of multiple step occurs and consequently partial 
entropy crisis is observed as mentioned in [22J. 

In the subsequent sections, we move on to the discussions of lossy data compression 
and channel coding. Actual evaluation of the performance of the random code ensemble 
is conducted in the range R\ < R2 < In 2. The phase diagram (figure [3]) and the function 
4>(f3, n) are of great use for this analysis, which explicitly demonstrates the advantage 
of the replica method. 
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3. Lossy data compression 

We start with the review of lossy data compression in [21] . This issue has also 
been investigated by statistical mechanics [301 Ell [32], [33] , and we concentrate on the 
hierarchical code here. After establishing how the generating function <f>(/3,n) relates 
to lossy data compression, we apply the result of the replica analysis in the previous 
section. 



3.1. Distortion rate 

We prepare the hierarchical random code ensemble C with size 2 N and aiV-bit 
hierarchical sequences (a > 1) as in [21] or equivalently in section [2T2] For an arbitrary 
aiV-bit hierarchical sequence, we represent it by one of the elements in C, which amount 
to the process of lossy data compression. 2 N sequences out of 2 aN possible ones have 
one-to-one correspondence with one of the elements in C, whereas others are distorted. 
To assess the performance of the compression process, we define the distortion (exactly 
the Hamming distortion) for the signal x as 

A(aj) = min ( d R (x, x) — ^— J , (17) 



&ec \ v ' ' 2 

where x and x are aiV-bit sequences. Subtraction of aN/2 in the definition is for 
simplification of the analysis. For extracting more information with regard to the 
distortion, it is an appropriate manner to define a characteristic function for the 
distortion [2T1. 



9(s) = [exp{-8A{x))] afi , (18) 

that is, the moment generating function of the distortion. The brackets [ ] Xj c denote 
the average over x and the ensemble of the code. Actually, we may fix the bit sequence 
X cLS X = and remove average over a?, because we take the average over the random 
code ensemble [ }c- In the large aN limit, the rate of ^f(s), denoted by if)(s) and defined 
as follows, characterizes the performance of the random code ensemble, 

m = _ li m IflEM = _ lim '"N>(-sA(0))]c, (19) 

This distortion rate ip(s) has a direct relation with the generating function <p(/3, n) of the 
GDREM. To see this, we should remember that the partition function of the GDREM, 
Z(/3), can be written in the bit representation. The distortion A(0) then corresponds 
to the ground state energy of the GDREM. Accordingly, the following transformation 
leads to the relation with the replicated partition function of the GDREM: 



exp( 



-sA(0)) = lim ( V exp i -- ( d H (x,0 




lim Z n (- ) . (20) 
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After taking average over the hierarchical random code ensemble, we have 



ib(s) = — lim lim — — In 

7V-)-oo n->o aN 



S' 

nJ J c 



— lim — c 

n->0 a 



n 



n 



(21) 



Consequently, we can directly assess the distortion rate i/j(s) from the generating 
function <f)((3,n) in the replica analysis. 

To summarize, the distortion rate is accessible from the replica analysis using the 
function 0(/3,n) with the constraint s = (3n and the limit of n — > 0. This means that 
the contributing phases to ip{ s ) are on t ne P n axis m the T-f3n diagram, where there 
exist phase transitions with respect to s = fin as we see in section 12.31 As a result, 
those transitions lead to the changes of the functional form of the distortion rate. 



3.2. Result 

In the case of lossy data compression, the parameter R = \n2/a, which controls the 
phase transitions of the GDREM, has the significance as the compression rate. Since 
we deal with compression of data, the compression rate should be smaller than In 2, in 
which case the RSB transitions occur as shown in section 12.31 



3.2.1. K = 1 To calculate the distortion rate ip(s), we take the limit n — > with 
keeping (3n = s in dealing with the function <p((3,n). Accordingly, contributing phases 
in the current problem turn out to be the RS2 and 1RSB phases. Using (|T5|) and (l2Tj) . 
the distortion rate can be derived as 

(1RSB) for < s < s R 
for s R < s, 

where the transition point Sr is given from ffTBT) . 

0. 



-f tanh^ 
-lncoshf-i? (RS2) 



(22) 



R + In cosh — — tanh — 

2 2 2 



(23) 



The above solution coincides with the result in |21] . Summarizing, the transition of the 
distortion rate is interpreted as the phase transition on the (3n axis in the T-j3n diagram, 
namely the transition between the RS2 and 1RSB phases. 



3.2.2. K = 2 We consider the case R\ < R2 < In 2 as mentioned in section [231 As in 
figure El there exist three phases on the (3n axis, the RS2-RS2, RS2-1RSB and 2RSB. 
Substituting these solutions into (|2T|) . we have 



( 



Ol 



tanh 



^ tanh ^ 

a 2 



) 



■ a In cosh § 

a 2 



(24) 



(2RSB) 

for < s < s^ 
f Rt - tanh^ (RS2-1RSB) 

for s Rl < s < s R2 
(RS2 - RS2) 
for sr 2 <s, 

where SR t and sr 2 are the solutions of equation (|23|) with substitutions R = Ri and 
R = R2, respectively. This also coincides with the result in 



In cosh I — R 
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3.3. Discussion 

To judge whether the hierarchical structure reinforces the performance of the code in 
lossy data compression or not, we compare the averaged distortion [A(0)]c = dip/ds\ s= o 
for both the cases K = 1 and 2. The optimal case is the K = 1 case, because it gives 
the smallest distortion. This means that the introduction of the hierarchy of the current 
sort has no positive effect on the lossy data compression, which is the same conclusion 
as [21] . However, we here stress two advantages of our formulation. 

First, our evaluation scheme is quite simple. We can treat both the cases Ri < R 2 
and i?2 < R\ in a unified framework and can easily see the relation between the cases 
K = 1 and K = 2. Generalization to the larger K cases is also straightforward, whereas 
such a generalization seems to involve many technical difficulties in the original analysis. 

Second, in our approach the transitions observed in the distortion rate can be 
understood as phase transitions with respect to the replica number, which include the 
RSB. This can provide more useful insights to signal processing including lossy data 
compression. For example, we can apply the complexity analysis to the current problem. 
The complexity, denoted by in figure HI is defined as the logarithm of the number 

of pure states (see [221 El] for details), which has a similar meaning to the entropy. 
Generally speaking, the higher step RSB leads to a decrease in low energy states, which 
implies the rise in ground-state energy (figure Hj). 



I(E) A 




Figure 4. A schematic behavior of complexity as a function of energy. Complexities 
from the 1RSB and 2RSB solutions are drawn by the dashed and the solid curves, 
respectively. E^ SB and E^ SB are the ground state energies of the 1RSB and 2RSB 
phases, respectively. For high energy states both the solutions give the same value of 
complexity, whereas for low energy states the 2RSB solution yields the smaller one. 

This directly elucidates the performance loss of the hierarchical random code 
ensemble, because the distortion A(0) is identical with the ground-state energy of the 
GDREM. This observation implies that the higher step RSB generally degrades the 
performance in lossy data compression. 
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4. Channel coding 

In this section we move on to the problem of channel coding and see the relation with 
the replica method. Although the basic line of the analysis here is the same as in [24], 
there is a difference in the discussion of bound for the indicator function. 



4-1- General framework 

Consider the BSC with reverse probability p (0 < p < 1/2). Following the framework 
in section 12.21 we also prepare an iV-bit signal and encode it to a hierarchical aiV-bit 
signal (a > 1), which is included in the codebook C of size 2 N . Then, we transmit 
an aiV-bit code sequence x through the BSC. The receiver decodes the original signal 
from an aiV-bit output by the maximum likelihood decoding, which yields the inferred 
sequence y. In the above setting, we define the error probability Pe(C) for a given set 
of original signal and codebook. In particular, we focus on the value averaged over the 
set of codebook C, [Pe(C)] c , whose expression is given by 



J2P^\C)P(y\x)A Mh (x,y\C) 



xec 



(25) 



P(x\C) is the prior probability of the original signal x and P(y\x) is the posterior 
probability characterizing the BSC. Aml{x, y\C) is the indicator function of the 
maximum likelihood decoding, which is zero for successful decoding and unity for failure. 

For simplicity, we assume that the probability of the transmit signal x is uniform, 
P(x\C) = 2~ N . The posterior of the BSC is readily calculated as 



P{y\x) 



1 



p^aN-dn(x,y)pdn(x,y) 



1 ) 

2 cosh(F/2) J 



aN 



exp 



-F[d n {x,y)-^- 



(26) 



where F = ln{(l — p)/p} (the Nishimori condition [U [35]) . Besides, we can take the 
summation over y in (125]) and replace the reference codeword y with 0, because the 
factor in [ ] becomes independent of y due to the summation J2 x ec an< ^ avera g e [ \c- 
Substituting these, we obtain 

aN 



\ P ^c = ^ ( 



cosh(F/2) 



2 / 



Aml(^,0|C) 



(27) 



J2^P\~F U H (x,0) 
.xec ^ ^ 

It is a formidable task to evaluate the indicator function directly, and its bound is 
usually discussed by using inequalities. For the hierarchical random code ensemble, it 
is convenient to use some different inequalities for different K. 



Statistical mechanical analysis of a hierarchical random code ensemble 



14 



4-2. Analysis and result 

In the case of channel coding, the parameter R = In 2/ a corresponds to the transmission 
rate. For successful communication, we have a > 1 or equivalently R < In 2, where the 
RSB phases play significant roles in the GDREM like lossy data compression. 



4-2.1. K = 1 This case corresponds to the conventional random code, and we make 
use of the inequality in Gallager's original work [5] [23] . 



A yiL (x.y\C) < { 

xGC\x 



P{y\x) 

P{y\x) 



(281 



The symbol C\x means the codebook C with x removed. In this inequality we can 
take the arbitrary non- negative real values of A and n, which should be optimized for 
the tightest upper bound. In Gallager's work A was fixed as A = l/(n + 1) by using 
Jensen's and Holder's inequalities. In contrast, in our approach we can readily deal 
with A without fixing, which is one of the advantageous points of the current analysis. 
Moreover, the condition A = l/(n + 1) may lead to a looser bound in some cases as 
indicated in [361 EH EE]. Hence, we adopt the two-parameter optimization here. 
Insertion of ([26]) and (EHJ) into (ETJ) yields 

1 / 1 \aN 

[Pe(C)] l 



1(3 - 2 N 



cosh(F/2) 



x 



J^exp 

xec 



-F{l-n\)[d n {x,0)-^- 



E expj-F(l-nA) (d H (5,0)-^)}j 



(29) 



J c 



In the current case K = 1, codewords are not mutually correlated, which allows us to 
rewrite the upper bound as 

aN 

[Pe(C)) c < 





1 




Vcosh(F/2) 


X 


£ ex p \ - 




xec ^ 


X 


E exp 




\x£C\x 




( 1 


2N 


Vcosh(F/2) 



E exp {-FA (d H (*,0)-^)}j 



aN 

exp(N {({>{F{1 - nX), 1) + <j>(F\, n)}). 

(30) 

With regard to the last factor in the first line, the absence of x in the sum over codebook 
C can be neglected without loss of generality in the limit N — > 00. Substituting the 
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trivial expression of 4>(F(1 — nX), 1) and optimizing A and n, we obtain the bound 



[P E (C)] C < exp(-aNE« =1 (R)) } (31) 
for iV — > oo, where E^ =1 (R) is known as Gallager's error exponent for the BSC, 

E? =1 (R) = max \--<f (FX, n) - In cosh F ^ ~ ^ + In cosh -1 . (32) 
o<n,A a 2 2 J 

Next, we apply the RSI, RS2 and 1RSB solutions in f fl5|) to the computation of the 
error exponent (1321) . 

• RSI 

The error exponent is expressed as 



f FA 

E^~ l (R) = max < —nR — n In cosh — 

r ^ ' 0<n,A\ 2 



— In cosh h In cosh — > . (33) 

From the maximization condition with respect to n we have 

FA FA FA 

R + In cosh tanh — = 0, (34) 

2 2 2' v ' 

leading FA = /3 C . Clearly, this means that the optimal solution is given on the 

RS1-1RSB boundary if < n < 1. Hence, we need not to take this solution into 

account because it can be included in the 1RSB solution. For n > 1 and FA = (3 C , 

the correct 0(FA,n) is not given by the RSI, which also leads to the irrelevance of 

the RSI solution. 

RS2 

In this case the error exponent is 

FXn 



E* =L (R) = max < -R - In cosh 

0<n,A L z, 

, F(l-An) . . F 1 
— mcosh h m cosh— >. (35) 

This form allows us to consider the maximization with respect to the product An, 
giving An = 1/2. The substitution yields 

Ef =1 (F) = —R - 2 In cosh — + In cosh — . (36) 
r 4 2 

Due to the functional form of the RS2 solution, where A and n appear only in the 

product An, we have a redundancy in optimizing A and n. On the T-(3n plane, this 

redundancy means that all the points on a line (3n = F/2 in the RS2 phase give the 

identical result ( 136]) . As F decreases, this horizontal line on the plane goes down 

along the vertical axis fin and finally this reaches the phase boundary between the 

RS2 and 1RSB phases, which gives the bound of the RS2 solution 2(5 C < F. This 

can be regarded as a simple graphical interpretation of the behavior of the error 

exponent. 
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1RSB 

The error exponent is 



, , /n , f FXn , p c , , F(l-An) 
E?- X {R) = max <^ tanh — - In cosh ■ v 



0<n,A [22 2 

+ In cosh ^j. (37) 
Maximization with respect to Xn yields 1 — An = /3 C /F. Substituting this, we have 



E? = \R)= - - 1 - ^ tanh ^ - In cosh ^ + In cosh - 
2 y F J 2 2 2 

F 8 F 
= - —tanh — + # + In cosh— . (38) 

Again, on the T-(3n plane, this result is irrespective of temperature similar to the 
RS2. This solution gives E r = at F = /3 C , which is the bound of successful error 
correction for the infinite size limit. 

Summarizing the above results, we finally obtain 

-R -2 In cosh £ 



E?=\R) 



4 

+ In cosh f (RS2) for 2(3 C < F 

^ tanh % - In cosh % 



2 """" 2 2 

+ In cosh f " (1RSB) for C < F < 2f3 c 

" (1RSB, 0n = O) forF</3 c . 



(39) 

After some calculations, we can confirm that the above error exponent by the replica 
analysis is in perfect agreement with Gallager's expression [51 [23], as well as the 
consistency with Shannon's channel coding theorem 0,11]. 

In fact, the above result is not a novel one, because the error exponent of some 
models in information theory, including the random code ensemble, has already been 
investigated by using statistical mechanics in some works [37J [39], HO, EJ H2] . However, 
the methods used there are different from the one we proposed. We here emphasize 
some advantages of our method. 

The first one is the applicability to larger K cases, which will be demonstrated 
in the next subsection. In such cases, the codewords are mutually correlated and 
the analysis becomes more complicated. Despite this, our scheme can evaluate the 
error exponent without any approximation except for a slight modification of Gallager's 
original inequality ( 1281) . 

Second, as observed and will be observed again for larger K, the transition of the 
error exponent is the RSB between the RS2 and 1RSB phases, which provides a simple 
interpretation of the function form change of the error exponent. This fact has never 
been observed or discussed explicitly, which might be due to the condition A = l/(n + l) 
originated from Jensen's and Holder's inequalities. Moreover, our analysis reveals that 
the RSB transition with respect to the replica number, which cannot be observed from a 
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thermodynamical quantity after the quenched average, is significant in channel coding. 
This situation is similar to lossy data compression. 

In addition, from the above discussion, we can find that the RSI phase is excluded 
from contributing phases, which has a significance in successful decoding. The detail 
will be argued after the analysis of K = 2. 

4-2.2. K = 2 In this case, the codewords are mutually correlated, which invalidates the 
factorization (130|) . In such a case, the general form (129]) should be directly evaluated, 
as long as Gallager's inequality ( 128]) is used. This calculation can actually be done 
by a novel replica approach, which is somewhat different from the standard one. In 
this approach, we deal with the factor in [ ]c in (1291) as the partition function of the 
GDREM with n + 1 replicas. However, there are two noteworthy points: First, the 
inverse temperatures are not common to all replicas. One replica out of n + 1 has the 
inverse temperature F(l — nX), and others have FX. This requires the special treatment 
of one replica with different temperature in the replica analysis. Second, the correlation 
between the special one replica and other n replicas exists in this case. As seen in 
the summation in (1291) . the special one cannot take the same state as those of other 
n replicas. Hence, we need to introduce asymmetry between replicas, in addition to 
the RSB among non-special n replicas. This novel replica method will generally be 
applicable to any other codes with mutual correlation among codewords, under the 
situation that Gallager's inequality is used. 

Actually, we applied this novel method to K — 2 and obtained the bound of 
the error probability. However this approach requires rather involved calculations. 
Fortunately we can avoid this novel approach by a slight modification of Gallager's 
original inequality. In this paper, we demonstrate this simpler approach in the 
framework of the ordinary replica analysis of the GDREM for K = 2. We confirmed 
that the results from both the schemes coincide with each other. 

As stated, we use another inequality for the indicator function here, instead of 
Gallager's original inequality ( |28l) . as 



{xi, X2{x\)}, and {x\, X2(xi)} is a codeword whose first block is the same as the correct 
transmission codeword x. Note that the second-block codeword depends on the first- 
block one, which is denoted by X2(xi). 

Substituting PU|) into (127]) . we can assess the upper bound of the error probability. 



A ML ({x l ,x 2 (x 1 )},y\C) 




(40) 



where the codeword x is represented by the hierarchical components as x 
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The contribution from the first term of ([417]) is equal to 
cosh(F/2) ) 



1 



a. 



N 



X 



ex p { 



-F(l-Aam) d H (a; 2 ,0) 



^ expj-FAi fd H (x 2 ,0) 




(41) 



where we express the entire codebook C by the hierarchical codebooks as C = (Ci,C 2 ), 
then perform the summations and ensemble average with respect to the first hierarchy, 
J2 Xl an d [ ]ci- The sizes of codebooks C\ and C 2 are and a^, respectively. 
The statistical independence between two different codebooks in the first hierarchy 
is essential for deriving expression (|41l) . The absence of correlation between x 2 and 
x 2 7^ x 2 is necessary as well. From the result of K = 1, this contribution (I4ip is simply 
expressed as exp(— a 2 NE^ =1 (R 2 )). 

The contribution from the second term is 

2^ 



x 



cosh(F/2) 

(£Ei,a!2(a!i))eC 

/ 



/ I - A 2 //o) ( d H ({xi,x 2 {x 1 )},0) - ^ 



n 2 - 



{fi 1 ,fi 2 ( 5 l)} 

eC\{a: 1 ,S 2 ( lc l)} 



■/•'A-2 | d H ({xi,x 2 {xi)},0) - ^ 



(42) 



To derive this expression, the absence of correlation between X\ and X\ ^ Xi is used in a 
similar manner. As a result, this contribution is represented by the generating function 
of the GDREM for K = 2. Denoting this contribution by exp(-aNE^ =2 (R, R u R 2 )), 
we can write the exponent as 



E^ =2 {R,R 1 ,R 2 ) = max <j --(f) (FX, n) - In cosh 



0<n,A 



l 



F(l - An) 



+ In cosh • 



(43) 



Note that <p(FX,n) in ( l4"3"|) is for the K = 2 case. Hence, the bound of the error 
probability for K = 2 is expressed as 

[Pe{E)]c < exp(-a 2 NE^ =1 (R 2 )) + exp(-aNE^= 2 (R, R u R 2 )), (44) 

for N — > oo. Therefore, we must compare the two contributions by computing E^ =1 (R 2 ) 
and E?= 2 (R, R X ,R 2 ). 
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Here we give a comment on 
the following result 



For K = 2, an intuitive discussion in [2T] suggests 



[Pe{E))c < exp(-a 2 NE« =1 (R 2 )) + exp(-aNE« =1 (R)), (45) 

for N — > oo, which differs from (jHJ). However, as mentioned later, the difference between 
( 1441) and ( 145]) is irrelevant because both the bounds are identical for N — > oo. Despite 
this irrelevance, we consider that our result is more natural and suggestive. The reason 
is as follows: In the case of the hierarchical random code ensemble, we can expect that 
the error probability can be decomposed into failure event from respective hierarchy. 
Our result (|44l based on f|40|) clearly reflects this feature. In the same way, we can 
expect the bound of the error probability for general K as 

K / u-l 



K=u 



(46) 



[P E (£)]c < X>P "X>*- 

v=l \ j=0 

for iV — > oo. This expression should be confirmed in future works. 

Next we compute the bound (1441) using the result of the replica analysis. The first 
term has already been estimated in the analysis of K = 1, and here we evaluate the 
second term. As stated in section |2T3| the case of i?i(< R) < R 2 < In 2 is dealt with 
here. The detail of the analysis is in Appendix A We summarize the main result in the 
following. 

Contributing phases to the error exponent are the RS2-RS2, RS2-1RSB and 2RSB. 
The error exponent is obtained as 

E^~ 2 (R, R 1 , R 2 ) 



—R — 2 In cosh -j + In cosh -j 



-lncosh|(F-^ 



Ol 



Ri 



-2i In cosh % - 2a% tanh 



+ In cosh 2" 



F-13 



tanh if 
+ In cosh £ 



In cosh f- 



(RS2 - RS2) 
for 2/3 2 < F 



(RS2 - 1RSB) 

for ft + p y < F < 2f3 2 

(2RSB) 

for P y < F < ft + p y 
(2RSB,/3n = 0) 
for F < ft,, 



(47) 



where ft and P y are given by 

a-i tanh \- a 2 tanh — 

2 2 

a\ tanh — + a 2 tanh — 



a tanh 



a tanh |. 



(4J 



As a result, only the phases on the /3n axis contribute to the error exponent similar to 
lossy data compression. This property will hold for arbitrary K, which simplifies the 
analysis for larger K. 
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Finally we must compare the contributions from the first and the second terms on 
right hand side of OS]). We checked it numerically and concluded that the first term 
always dominates when R\ < R 2 < In 2. On the other hand, for R 2 < R\, the second 
term dominates and yields the same result as K = 1. 

4-3. Discussion 

Now we are ready to compare the performances of K = 1 (non-hierarchical) and K = 2 
for Ri < R 2 < In 2. Comparing the dominant contribution exp(— a 2 N E^ =1 {R 2 )) for 
K = 2 with the non-hierarchical result exp (-aNE? =1 (R)) } we found that the error 
exponent of the non-hierarchical code always surpasses that of the hierarchical code for 
fixed N. Therefore, the hierarchy degrades the performance of decoding for R\ < R 2 . 
Although this conclusion is the same as Merhav's discussion [21], our formulation has 
crucial advantages in the way of reasoning. 

Our approach has never loosened the bound of the error exponent, except for 
Gallager's inequality ( 128|) . This can be achieved with the aid of the replica method, 
and furthermore the two-parameter optimization with respect to A and n can be 
reasonably conducted. In conventional approaches, several inequalities, such as Holder 
and Jensen inequalities, are employed as in |21] . However, in such analyses the parameter 
optimization is usually performed only with respect to n, by fixing A as A = l/{n + 1). 
These manipulations do not only loosen the bound of the error exponent but also obscure 
the origin of transitions of the error exponent. Without such risks, our formulation 
enables us to analyze the performance of random codes in detail. As a result, some 
physical significances of the behavior of the error exponent can be extracted as follows. 

We know that the contributing phases to the error exponent always include the 
RS2 phase and/or the 1RSB phase in their hierarchy, and the RSI phase is excluded. 
The reason is probably elucidated as follows. In a successful case of decoding, the 
concentration of the Gibbs measure to a certain input signal is expected to be realized. 
In the RS2/1RSB phases such concentration actually occurs: for the RS2 a single 
state is chosen by definition (see figure [2] or J22[ [21]) and for the 1RSB the measure 
concentration occurs due to glassy nature. On the other hand, for the RSI phase such 
concentration does not occur or the phase is paramagnetic, which corresponds to an 
inefficient decoding. Consequently we do not need to consider the RSI phase for the 
discussion of optimal decoding. We expect this argument is applicable to a general code 
ensemble. 

The above observation also gives some benefits in the practical analyses of 4>((3, n): 
this function will always be written by the function of the product fin for the contributing 
RS2/1RSB phases. This is due to the measure concentration for successful decoding, 
in which case the replicated partition function is written with the product j3n. From 
this discussion, we also conclude that the condition A = l/(n + 1) for the one-parameter 
optimization, which is used in the original discussion [HI [23] and valid there, sometimes 
yields a looser upper bound than the two-parameter optimization. Actually, if we put 
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A = l/(n + 1) in (jUl) . the RSI phase becomes included in the final solution, and we 
obtain a looser bound than (HTj) . although it does not contribute to the error probability 
for iV — > oo due to the dominance of the first term exp(—a2N E^ =1 (R2)) in the current 
situation. Hence, the one-parameter optimization used in Gallager's work should be 
carefully examined. This observation will also be helpful to other problems of channel 
coding. 

5. Conclusion 

We investigated the hierarchical random code ensemble by using the direct relation 
with the GDREM. We sketched how the replica analysis is carried out and is useful 
for large deviation analysis, namely computations of the distortion rate in lossy data 
compression and Gallager's error exponent in channel coding. We provided formulae for 
these quantities and demonstrated how they are evaluated in the case of two hierarchy 
levels. For lossy data compression, the distortion rate from the replica analysis is in 
perfect agreement with [21]. Using our method, we could calculate the distortion rate 
quite readily, which is one of the advantageous points. We also interpreted the behavior 
of the distortion rate in terms of the complexity, and found that the emergence of the 
higher step RSB degrades the performance of data compression. From the relation 
between the complexity and the RSB transition, this conclusion will hold for a general 
hierarchical code, which is helpful in designing code. In addition, we obtained the novel 
result for channel coding. The procedure to compute the upper bound is different from 
Gallager's argument. This difference arises from the correlation between codewords 
in the replica analysis. Our result from the two-parameter optimization seems quite 
natural because the measure concentration is associated with optimal performance as 
we discussed. 

In both the problems, we argued that the RSB transition with respect to the replica 
number is significant. Although our analysis is based on the mapping between two 
fundamental models in signal processing and spin glasses, we expect the application 
of the proposed method to other ensembles or problems in signal processing. We also 
hope that the observation in this paper for data compression or channel coding from 
the viewpoint of the RSB will be of use in other problems. Such an application will be 
our future work. 
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Appendix A. Error exponent for K = 2 

We evaluate the error exponent in each phase for the case K = 2 by using (I43jl and the 
result of the replica analysis f lT5|) . 

• RS1-RS1 

This gives the same result as the RSI of K — 1. 

• RS2-RS1 



From (CE 



E r (R,Ri,R 2 ) 

f ( ' ai, , Fn\\ (_ a 2l , F\\ 

max < — \ Ri H in cosh — n ii2 H m cosh 

o<n,A (^y a 2 J \ a 2 J 

n ,F(1 - riA) F\ 
— In cosh h hi cosh — >. (A.l) 



2 2 ^ 

Two extremization conditions give FA = 02, which means that the extremization 
region is always on the boundary between the RS2-RS1 and RS2-1RSB phases, and 

— tanh — — H — -tanh— = tanh — . (A. 2) 

a 2 a 2 2 v ; 

The error exponent is rewritten as 

E^~ 2 (R, Ri,R 2 ) 

D . ai , , /32n\ a 2 /3 2 n fa 

R\ H in cosh tanh — 

a 2 ) a 2 2 

— In cosh + In cosh — , (A. 3) 

2 2' K ' 

where h is the solution of (IA.2j) . As mentioned, the extremization region is on the 

phase boundary, and this result can be included in the case of the RS2-1RSB. 

RS2-RS2 

This gives the same result as the RS2 for K = 1. 
1RSB-RS1 



From ffT5|) . 



E r 2 (R, Ri, R 2 ) 
ai FnX 



max 

0<n,A 



tanh — — n ( R 2 H — - In cosh ) 

2 \ a 2 J 



-In cosh — - + In cosh— >. (A.4) 



2 2 

From two extremization conditions we have FA = (3 2 , which means that the 
extremization region is always on the boundary between the 1RSB-RS1 and 2RSB 
phases, and 

— tanh— H — - tanh— = tanh — . (A. 5) 

a 2 a 2 2 
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Then the error exponent is changed to 

E^~ 2 (R, Ri, R 2 ) 

fen fat ,fii.a2 , 02 

= — tanh tanh — 

2 \a 2a 2 

— In cosh 2~~~ + hi cosh— , (A. 6) 

where n is the solution of (1A.5j) . As stated, the extremization region is on the phase 
boundary. This result can be included in the case of the 2RSB. 

RS2-1RSB 
From (USD , 

E^~ 2 (R, R l ,R 2 ) 

= max < R\ in cosh tanh — 

o<«,A [a a 2 a 2 2 

. F(l-An) . , F) ^ 

— In cosh h In cosh — > . (A.7) 



2 2 , 

The extremization condition with respect to An gives 

— tanh — - — | — - tanh— = tanh — — (Ai 

a 2 a 2 2 K 

Substituting the solution of flA. 8[) . we have a part of solution f )47|) . 

2RSB 

From ffl~5l). 



E r 2 (R,Ri,R 2 ) 

\ I LcLIlll LcLIlIl I 

\ 2 V a 2 a 2 J 



max 

0<n,A 



. F(l-An) ,F\ 
In cosh — + In cosh — > . (A. 9) 



2 2 
From the extremization with respect to Xn, 

^ tanh * + ^ tanh & = tanh F(1 ~ An > . (A.10) 
a 2 a 2 2 

Inserting the solution of (1A.10I) . we have a part of solution (|4 
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